Are Sparse Representations Rich Enough for Acoustic Modeling?
نویسندگان
چکیده
We propose a novel approach to acoustic modeling based on recent advances in sparse representations. The key idea in sparse coding is to compute a compressed local representation of a signal via an over-complete basis or dictionary that is learned in an unsupervised way. In this study, we compute the local representation on speech spectrogram as the raw “signal” and use it as the local sparse code to perform a standard phone classification task. A linear classifier is used that directly receives the coding space for making the classification decision. The simplicity of the linear classifier allows us to assess whether the sparse representations are sufficiently rich to serve as effective acoustic features for discriminating speech classes. Our experiments demonstrate competitive error rates when compared to other shallow approaches. An examination of the dictionary learned in sparse feature extraction demonstrates meaningful acoustic-phonetic properties that are captured by a collection of the dictionary entries.
منابع مشابه
Image Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملFace Recognition in Thermal Images based on Sparse Classifier
Despite recent advances in face recognition systems, they suffer from serious problems because of the extensive types of changes in human face (changes like light, glasses, head tilt, different emotional modes). Each one of these factors can significantly reduce the face recognition accuracy. Several methods have been proposed by researchers to overcome these problems. Nonetheless, in recent ye...
متن کاملSparse representation with temporal max-smoothing for acoustic event detection
In order to incorporate long temporal-frequency structure for acoustic event detection, we have proposed a spectral patch based learning and representation method. The learned spectral patches were regarded as acoustic words which were further used in sparse encoding for acoustic feature representation and modeling. In our previous study, during feature encoding stage, each spectral patch was e...
متن کاملNoise-robust Automatic Speech Recognition with Exemplar-based Sparse Representations Using Multiple Length Adaptive Dictionaries
In this work, we apply our recently proposed sparse representations based speech recognition system on the small vocabulary track of the 2 ‘CHiME’ Speech Separation and Recognition Challenge. This system uses exemplars of different length to approximate noisy speech segments as a linear combination of the speech and noise exemplars with sparse weights. The exemplars are labeled speech segments ...
متن کاملPooling Robust Shift-Invariant Sparse Representations of Acoustic Signals
In recent years, designing the coding and pooling structures in layered networks has been shown to be a useful method for learning high-level feature representations for visual data. Yet, such learning structures have not been extensively studied for audio signals. In this paper, we investigate different pooling strategies based on the sparse coding scheme and propose a temporal pyramid pooling...
متن کامل